Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runc create/run: warn on rootless + shared pidns + no cgroup #4398

Merged
merged 3 commits into from
Sep 24, 2024

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Sep 12, 2024

Shared pid namespace means runc kill (or runc delete -f) have to
kill all container processes, not just init. To do so, it needs a cgroup
to read the PIDs from.

If there is no cgroup, processes will be leaked, and so such
configuration is bad and should not be allowed. To keep backward
compatibility, though, let's merely warn about this for now.

Alas, the only way to know if cgroup access is available is by returning
an error from Manager.Apply. Amend fs cgroup managers to do so (systemd
doesn't need it, since v1 can't work with rootless, and cgroup v2 does
not have a special rootless case).

Related to #4394, #4395.

@@ -42,6 +42,7 @@ func Validate(config *configs.Config) error {
// Relaxed validation rules for backward compatibility
warns := []check{
mountsWarn,
rootlessSharedPidns, // TODO: make it an error in runc 1.3.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to error out if we implement walking the process tree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to error out if we implement walking the process tree

To me it looks neither possible nor desirable.

Not possible because

  • init process might be gone already (with some other processes still running);
  • walking the tree requires freezing all the processes, as otherwise the walker will be racing with forks;

Not desirable because

  • the code will probably be very slow and resource hungry, thanks to text-based nature of /proc;
  • cgroup v1 is going to be obsolete (one day it will; fingers crossed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the only possible way to implement this process tree walking nicely (i.e. not slow and resource hungry) would be via ebpf (which has access to kernel-internal data structures), but the kernels supporting it are probably running cgroup v2 already. Even with ebpf, other issues cited above remain.

@kolyshkin kolyshkin changed the title runc create/run: warn on cgroup v1 + shared pidns + rootless runc create/run: warn on shared pidns + rootless + no cgroup delegation Sep 12, 2024
@kolyshkin
Copy link
Contributor Author

@AkihiroSuda I think we need more than was done in #4395.

First, I am a bit puzzled why we see

level=warning msg="failed to kill all processes, possibly due to lack of cgroup (Hint: enable cgroup v2 delegation)" error="container not running"

from runc create.

Second, I guess we need to add a special case to (*Container).Signal to not log the error when cgroup v2 + systemd >= v245 is used, because in such case we only need to kill the initial process and systemd takes care of the rest (i.e. it kills all processes in a cgroup when the initial process is gone).

WDYT?

@kolyshkin
Copy link
Contributor Author

Maybe we'll also need to call c.cgroupManager.Destroy() from (*Container).Signal when systemd manager is used, as a last resort. Currently this is only done during runc delete -f but not during runc kill -9.

@lifubang
Copy link
Member

First, I am a bit puzzled why we see

Mainly because of ‘delete -f’ in teardown?

function teardown_bundle() {
[ ! -v ROOT ] && return 0 # nothing to teardown
cd "$INTEGRATION_ROOT" || return
teardown_recvtty
local ct
for ct in $(__runc list -q); do
__runc delete -f "$ct"
done
rm -rf "$ROOT"
remove_parent
}

@kolyshkin
Copy link
Contributor Author

First, I am a bit puzzled why we see

Mainly because of ‘delete -f’ in teardown?

You are right; I was sure messages from __runc (which is part of teardown) are not logged, and yet they are.

@kolyshkin
Copy link
Contributor Author

OK, validation check can't work right as it does not know whether cgroup is actually accessible. Need to log a warning later when manager.Apply fails. Reworked this PR to do just that.

@kolyshkin kolyshkin force-pushed the no-shared-pidns branch 3 times, most recently from 2c48e0a to 7168d59 Compare September 13, 2024 17:41
@kolyshkin kolyshkin changed the title runc create/run: warn on shared pidns + rootless + no cgroup delegation runc create/run: warn on rootless + shared pidns + no cgroup Sep 13, 2024
@kolyshkin kolyshkin force-pushed the no-shared-pidns branch 2 times, most recently from e930bf9 to 5c864e9 Compare September 13, 2024 18:16
@kolyshkin kolyshkin marked this pull request as ready for review September 13, 2024 18:16
@@ -580,7 +580,18 @@ func (p *initProcess) start() (retErr error) {
// cgroup. We don't need to worry about not doing this and not being root
// because we'd be using the rootless cgroup manager in that case.
if err := p.manager.Apply(p.pid()); err != nil {
return fmt.Errorf("unable to apply cgroup configuration: %w", err)
if errors.Is(err, cgroups.ErrRootless) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the purpose of warning, I think this PR LGTM.
But further more, I think we should serialize this to state.json, for example with a field name noCgroup, it is useful for doing runc kill.
Please see: #4395 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking about it, too, but let's do improvement at a time, shall we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we are trying to cut 1.2.0 for some time now, and this PR is a result of a recent development in the area (a regression described in #4394 (and fixed by #4395). As I noted in #4394 (comment), I'm OK with the fix in #4395, but it would be nice to also introduce a warning; this is what this PR does.

I think we can introduce noCgroup in runc 1.3. Feel free to open an issue about it so we won't forget.

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

if errors.Is(err, cgroups.ErrRootless) {
// ErrRootless is to be ignored except when the
// container doesn't have private pidns.
if !p.config.Config.Namespaces.IsPrivate(configs.NEWPID) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not put this condition in the previous if? Do you think it is more readable like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that initially, but rolled it back later since changing the code like this complicates the review (the whole block changes instead of just one line).

I will add a separate commit that does it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad; I mixed this up with something else. Updated; PTAL @rata

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ughm, I just broke everything.

This is two if statements (and an else) because we want to

  • ignore ErrRootless;
  • except there's no private pidns, in which case we want a warning;
  • return all other errors as is.

No way to do that with a single if.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I tried to collapse it and I'm not sure it is more readable than this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a question of readability. There are three different conditions, can't do it with a single if.

@kolyshkin
Copy link
Contributor Author

@AkihiroSuda PTAL

In these cases, this is exactly what we want to find out.

Slightly improves performance and readability.

Signed-off-by: Kir Kolyshkin <[email protected]>
This aids in failed test analysis by allowing to distinguish the output
of various commands being run as part of the test case from the output
of teardown command like runc delete.

Signed-off-by: Kir Kolyshkin <[email protected]>
Shared pid namespace means `runc kill` (or `runc delete -f`) have to
kill all container processes, not just init. To do so, it needs a cgroup
to read the PIDs from.

If there is no cgroup, processes will be leaked, and so such
configuration is bad and should not be allowed. To keep backward
compatibility, though, let's merely warn about this for now.

Alas, the only way to know if cgroup access is available is by returning
an error from Manager.Apply. Amend fs cgroup managers to do so (systemd
doesn't need it, since v1 can't work with rootless, and cgroup v2 does
not have a special rootless case).

Signed-off-by: Kir Kolyshkin <[email protected]>
@@ -129,14 +129,15 @@ func (m *Manager) Apply(pid int) (err error) {
// later by Set, which fails with a friendly error (see
// if path == "" in Set).
if isIgnorableError(c.Rootless, err) && c.Path == "" {
retErr = cgroups.ErrRootless
Copy link
Member

@lifubang lifubang Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this is too strict? Maybe we should add a condition here:

if name == "devices" {
    retErr = cgroups.ErrRootless
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really matter here because there can't be a situation where devices cgroup can't be created while others can.

// the container doesn't have private pidns.
if !p.config.Config.Namespaces.IsPrivate(configs.NEWPID) {
// TODO: make this an error in runc 1.3.
logrus.Warn("Creating a rootless container with no cgroup and no private pid namespace. " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about s/no cgroup/no devices cgroup/ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think such message will have a negative effect, as it will be more confusing to a user (especially in cgroup v2 case).

@kolyshkin kolyshkin merged commit a1acfcf into opencontainers:main Sep 24, 2024
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants